Lift hc2_bm + weights gates via clubSandwich WLS-CR2 port#475
Conversation
`vcov_type="hc2_bm" + weights` (both one-way and cluster-robust) is now supported, matching `clubSandwich::vcovCR(..., type="CR2") + coef_test(test= "Satterthwaite")$df_Satt` and `Wald_test(test="HTZ")$df_denom` at atol=1e-10 on six new weighted scenarios in clubsandwich_cr2_golden.json. Immediate UX benefit: DifferenceInDifferences, MultiPeriodDiD, and TwoWayFixedEffects now accept `vcov_type="hc2_bm" + survey_design= SurveyDesign(weights=...)` for analytical weights. Closes TODO.md rows 104-105 (open weighted-CR2 gates). Algorithm note: the diff-diff form matches clubSandwich's specific algebra (W not sqrt(W) in hat matrix, W² in bias term, unweighted residuals in score), NOT a textbook Pustejovsky-Tipton (2018) §3.3 transform-once derivation - the two diverge by 0.5-30% on weighted designs per feedback_wls_cr2_clubsandwich_parity. Satterthwaite DOF uses the full H1/H2/H3 array construction (clubSandwich get_arrays.R::get_GH), not the simpler (tr B)²/tr(B²) form (which is exact unweighted but diverges from clubSandwich on weighted designs by ~6%). Step 0 R smoke test validated the algorithm at atol=1e-15 before source edits per feedback_r_source_smoke_test_before_implementing. Unweighted CR2-BM is bit-equal to prior at atol=1e-14 (regression-safe via TestUnweightedRegressionStillBitEqual + TestDOFFormulaDualPathEquivalence asserting the simple and P_array DOF formulas agree at the unweighted limit). clubSandwich version pin: >= 0.7.0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…usters, P2 docstrings)
P0 (weight_type contract gap): The clubSandwich WLS-CR2 port matches the
`pweight` (sampling-weight) convention only. The dispatcher now rejects
`vcov_type="hc2_bm" + weights + weight_type in {"aweight", "fweight"}`
with NotImplementedError pointing to `weight_type="pweight"` or
`vcov_type="hc1"` (CR1 supports all three weight types) as workarounds.
P1 (zero-total-weight clusters): `_compute_cr2_bm` and
`_compute_cr2_bm_contrast_dof` now drop zero-total-weight clusters before
the G>=2 check, raising ValueError when fewer than 2 effective clusters
remain. Mirrors the CR1 zero-cluster handling. Three-cluster fits where
one cluster has all-zero weight silently drop it (its scores contribute
zero anyway).
P2 (stale docstrings): Updated `_validate_vcov_args` and `solve_ols`
docstrings to reflect the lifted-gate + pweight-only scope. Removed the
contradictory "Not supported with weights" claim that survived the gate
lift.
New tests in `tests/test_methodology_wls_cr2.py`:
- TestWLSCR2WeightTypeRejection: 4 tests (aweight/fweight rejections on
cluster and one-way paths; pweight smoke acceptance test).
- TestWLSCR2ZeroWeightClusterRejection: 3 tests (one-zero-cluster reject
on both `_compute_cr2_bm` and `_compute_cr2_bm_contrast_dof`; multi-
cluster with one zero-weight silent drop).
All 53 linalg+methodology tests pass; broader 336-test regression suite
across estimators / TWFE / MPD / SA / vcov_type also clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-cluster + scope tightening) R2 P0: LinearRegression.fit() previously skipped populating self._bm_dof when both effective_cluster_ids AND _fit_weights were present (the path the R1 clubSandwich port lifted), so get_inference() fell back to df = n - k and produced anti-conservative p-values / CIs on the weighted-cluster hc2_bm surface. The dispatcher already guards non-pweight weighted hc2_bm at the linalg validator level, so reaching the _bm_dof branch guarantees a finite Satterthwaite DOF. Drop the weighted-cluster skip and populate _bm_dof from compute_robust_vcov(..., return_dof=True) like the other hc2_bm paths. R2 P2: new regression tests TestLinearRegressionWeightedClusterHC2BM (2 tests) verify LinearRegression._bm_dof matches compute_robust_vcov-level Satterthwaite DOF and that get_inference(index=i).df threads correctly per coefficient. Sanity check: cluster-driven DOF << n-k (catches future regressions where the fallback would otherwise re-emerge). R2 P3: stale docstrings at solve_ols (linalg.py:1260) and LinearRegression class docstring (linalg.py:2852) updated to reflect the lifted hc2_bm + pweight surface and the documented aweight/fweight restriction. R2 P3: CHANGELOG and REGISTRY entries reworded to scope the lift to the analytical surface (compute_robust_vcov / solve_ols / LinearRegression direct callers + analytical CR2 contrast DOF in MPD). Removed the incorrect claim that survey_design= callers benefit directly — survey designs route through the Taylor-series linearization (TSL) survey variance path, which takes precedence over the analytical CR2 sandwich (unchanged). All 194 linalg/methodology regression tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e comments) R3 was ✅ "Looks good" with only 2 P3 informational items; addressing both for cleanliness. P3 #1 (dead-code thread, estimators.py:1893): The R1 patch threaded `weights=survey_weights` into `_compute_cr2_bm_contrast_dof` on the `not _use_survey_vcov` branch, but that branch only fires when `survey_design=` is unset, in which case `survey_weights` is always None (survey designs always route through the TSL `_use_survey_vcov=True` path). The threading was a no-op and made the surface look like it supported weighted MPD avg_att via survey_design — which it doesn't. Removed the kwarg and updated the comment to reflect the de facto contract on the analytical branch. P3 #2 (R-script comment scope, generate_clubsandwich_golden.R): comments on `weighted_did_absorbed_fe` and `weighted_mpd_avg_att_dof` said the fixtures pin `DiD/MPD(survey_design=SurveyDesign(weights="w"))` paths. Reworded to say these are analytical-CR2 design-matrix parity fixtures on DiD/MPD-shaped designs (the public surface they actually pin is `compute_robust_vcov` / `solve_ols` / `LinearRegression` / the analytical CR2 contrast-DOF helper). All 144 linalg + methodology + estimators-vcov-type tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R4 was ✅ "Looks good" with 2 more P3 informational items; addressing for final cleanliness. P3 #1 (registry overstatement, REGISTRY.md:L2646): The Gate 4-5 lift entry still claimed coverage of "the analytical CR2 contrast DOF used by MultiPeriodDiD.fit() when survey_design= is NOT set and weights are passed via another mechanism" — but MPD has no non-survey weighted public entry point. Reworded to scope the lift to compute_robust_vcov / solve_ols / LinearRegression direct callers, with a separate note that `_compute_cr2_bm_contrast_dof(weights=)` is helper-ready but not exercised by public MPD. P3 #2 (docstring drift, _validate_vcov_args): The Raises block claimed `_validate_vcov_args` itself rejects non-pweight hc2_bm + weights, but the function has no weight_type parameter and the actual enforcement lives in `_compute_robust_vcov_numpy` (which has weight_type in scope). Narrowed the docstring to describe what `_validate_vcov_args` actually validates (conley + weights), with a pointer to where the pweight enforcement happens. All 53 linalg + methodology tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… drift) R5 was ✅ "Looks good" with 1 P3 informational item: the docstring of _compute_bm_dof_from_contrasts still described only the unweighted (tr B)^2 / tr(B^2) formula, but the function body now dispatches the weighted case to the clubSandwich singleton-cluster CR2 P_array form. Split the docstring into Unweighted and Weighted sections matching the two code paths. No code change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…asts (R6 P1) R6 codex review surfaced a P1: the weighted CR2-BM per-coefficient Satterthwaite DOF disagreed with clubSandwich by 15-30% on `weighted_did_absorbed_fe`'s treated-unit dummies (unit2/3/4), even though vcov matched at machine precision. Root cause (after instrumentation against R's get_GH and CR2 source): the contrast vectors for high-leverage FE-dummy coefficients project to near-zero on the design (e.g., unit2's dummy column has XW_g0 row 2 = exact zeros for the unit-1 cluster). The resulting per-cluster H_array slices and P_array entries land at the float64 noise floor (~1e-30 for typical matmul-product roundoff at ~1e-16 per entry). The DOF formula `(tr P)² / sum(P²)` is scale-invariant, but R and NumPy use different BLAS reduction orders, producing 1-bit-different roundoff that propagates into 30% DOF disagreement. Not a fixable algebra bug — fundamental FP precision limit for high-leverage contrasts. Mitigation: detect the noise floor (per-contrast `max(|P|)` below `1e-10 ×` the largest contrast's `max(|P|)`) and return NaN with a `UserWarning`. Honest signal that the DOF cannot be reliably computed instead of silently shipping BLAS-implementation-dependent inference. The coefficient SEs remain valid; only the affected DOF (and any t-test or CI that depends on it) is suppressed. Documented as a precision limit in REGISTRY.md and CHANGELOG.md. New regression test `TestWLSCR2FEDoFNoiseGuard` pins the NaN-guard behavior on the weighted_did_absorbed_fe scenario (unit2/3/4 expected NaN; all other 9 coefficients still match clubSandwich at atol=1e-10). All 339 linalg + estimators + methodology tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…R7 P0) R7 codex flagged a P0: the noise-floor NaN-guard in _cr2_bm_dof_inner_weighted correctly returns NaN DOF, but LinearRegression.get_inference() converted non-finite _bm_dof to df=None, which safe_inference() then treated as normal-theory inference — producing huge t-stats, p≈0, and zero-width CIs for the guarded coefficients instead of suppression. Fix: in get_inference(), when _bm_dof[index] is non-finite (NaN), return InferenceResult with NaN t_stat/p_value/conf_int and df=None directly, short-circuiting the normal-theory fallback. SE and coefficient remain valid (vcov matched at machine precision); only the affected coef's small-sample inference is suppressed. New end-to-end regression test TestLinearRegressionFENanGuardEndToEnd fits the public LinearRegression(vcov_type="hc2_bm", weights=, cluster_ids=) on weighted_did_absorbed_fe and asserts: NaN inference for the 3 treated- unit dummies (the noise-floor cases) AND finite inference for the other 9 coefficients. This catches the exact failure mode R7 surfaced. Also tightens CHANGELOG/REGISTRY wording (R7 P3): explicitly call out that "vcov + non-noise-floor DOF + compound-contrast DOF match clubSandwich"; high-leverage FE-dummy coefficients are suppressed to NaN. All 339 linalg/estimators/methodology tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality No findings. Performance
Maintainability No separate findings beyond the duplicated CR2 work above. Tech Debt No findings. Security No findings. Documentation/Tests
Assumption
|
CI codex on PR #475 (✅ verdict) flagged a real P2: the noise-floor NaN- guard in `_cr2_bm_dof_inner_weighted` was batch-relative only — for a single-contrast call to `_compute_cr2_bm_contrast_dof`, `max|P|_overall` equals the contrast's own max|P|, so the `1e-10 × max|P|_overall` rule could never classify it as degenerate. That left direct single-contrast weighted callers (e.g., MPD avg_att) unprotected: they could still emit BLAS-implementation-dependent finite DOF on noise-floor contrasts even though the registry/changelog said the helper was guarded. Fix: union the batch-relative criterion with an absolute floor scaled to the bread matrix's magnitude: `(EPS × n × k × max(bread_inv_scale, 1))²`. This covers the worst-case dgemm accumulation roundoff floor for `H1/H2/H3 @ contrast` products. A single-contrast call now correctly fires the NaN-guard on a high-leverage FE-dummy contrast. New regression tests in `tests/test_methodology_wls_cr2.py:: TestWLSCR2SingleContrastNoiseFloor` (2 tests): single weighted FE-dummy contrast triggers NaN-guard + warning; single non-noise contrast still returns finite DOF matching clubSandwich at atol=1e-10. CI codex P3 (perf): LinearRegression.fit() pays CR2 twice on the new weighted hc2_bm path (solve_ols + compute_robust_vcov). Added as a TODO follow-up row (PR #475 follow-up, Low priority). All 198 linalg+methodology+estimators-vcov-type+TWFE+SA tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ⛔ Blocker Executive Summary
Methodology
Code Quality No findings. Performance
Maintainability No findings. Tech Debt
Security No findings. Documentation/Tests
Path to Approval
|
… P0 R2) CI codex round 2 on PR #475 flagged a P0: weighted clustered CR2 wasn't subpopulation-invariant on mixed-zero clusters. The earlier "drop zero- total-weight clusters" guard handled all-zero clusters but missed mixed- zero clusters (positive total weight, some zero-weight rows inside). In those clusters, zero-weight rows still entered the CR2 adjustment matrices (H_gg, G_g, A_g, bias_term) on the row side, silently changing SE/DOF — contradicting the linalg contract that zero-weight rows are inert. Fix: physically filter `weights > 0` rows before all per-cluster computations in both `_compute_cr2_bm` and `_compute_cr2_bm_contrast_dof`. The caller's `bread_matrix = X.T @ (X * w[:, None])` is invariant to zero-weight row removal (those rows contribute 0 to the sum), so no bread rebuild is needed. Effective-cluster filter still applies on the filtered view. New regression tests `TestWLSCR2SubpopulationInvariance` (2 tests): - `test_per_coefficient_dof_invariant_to_zero_weight_padding`: pin vcov + per-coefficient DOF at atol=1e-12 between (a) computing on the full design with zero-weight padding rows interleaved and (b) computing on the physically dropped positive-weight subset. - `test_contrast_dof_invariant_to_zero_weight_padding`: same invariance for compound-contrast DOF. Also addresses two P3 doc items from the same review: - REGISTRY + CHANGELOG noise-floor description: now explicitly mentions both criteria (batch-relative AND absolute single-contrast safe) instead of only the batch-relative one. - TODO.md Tier C self-contradiction: marked the WLS-CR2 line as LIFTED with a back-reference to closed rows 109-110, removing the contradiction between the rows table (lifted) and the prose Tier C list (still outstanding). P3 performance (LinearRegression pays CR2 twice on weighted hc2_bm path) remains tracked in TODO.md row 111 from the previous round. All 200 linalg+methodology+TWFE+SA+estimators-vcov-type tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology No P0/P1 findings. The registry now documents the load-bearing methodology choices and precision-limit behavior in
Code Quality No findings. Performance
Maintainability No findings. Tech Debt No additional findings beyond the tracked weighted-CR2 double-compute follow-up in Security No findings. Documentation/Tests No blocking findings. The new methodology harness covers parity, weight-type rejection, effective-cluster rejection, mixed-zero-row subpopulation invariance, single-contrast noise-floor guarding, and I could not execute the suite here because the sandbox lacks NumPy/pytest. |
CI codex R3 on PR #475 (✅ verdict) flagged a single actionable P3 docs item: the Phase 1a registry row said "Weighted Bell-McCaffrey is supported" but didn't record the load-bearing `pweight`-only restriction. The code correctly rejects `aweight`/`fweight` at the linalg validator, but the registry overstated the supported surface. Fix: added a sentence to the Phase 1a row stating the clubSandwich WLS-CR2 port matches the `pweight` convention only; `aweight`/`fweight` remain intentionally unsupported (CR1 / `vcov_type="hc1"` still supports all three). CI codex R3 P3 perf (LinearRegression pays CR2 twice on weighted hc2_bm path) is already tracked in `TODO.md` row 111 from R2; no new action. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Adds vcov_type ∈ {"classical","hc1","hc2","hc2_bm"} to StackedDiD with
clubSandwich R parity at atol=1e-10 on the hc2_bm path. Defaults to hc1
(prior behavior at machine precision).
Source edits:
- __init__ accepts vcov_type=, rejects classical/hc2 with cluster-
incompatibility ValueError (StackedDiD clusters intrinsically at unit
or unit_subexp; one-way families don't compose with cluster_ids per
the linalg validator), rejects conley with a deferral message.
- fit() rejects survey_design + vcov_type != "hc1" with
NotImplementedError (survey TSL/replicate-refit overrides analytical
sandwich; SA precedent). Reject order locked: fweight/aweight check
fires first, then vcov check (pinned by test_aweight_plus_hc2_bm_*).
- Main solve_ols call at :419: switched from bake-Q-into-X
(X_t = X*sqrt(Q)) to explicit `solve_ols(X, Y, weights=composed_weights,
vcov_type=self.vcov_type)`. solve_ols internally bakes Q for the coef
solve AND back-transforms for vcov on original-scale data via
clubSandwich's WLS-CR2 algebra for hc2_bm (PR igerber#475).
- _refit_stacked closure at :444: mirror switch (cosmetic per
return_vcov=False but grep-consistency).
- StackedDiDResults gains vcov_type field; get_params() includes it.
- llms-full.txt agent-facing entry updated.
R parity:
- benchmarks/data/stacked_did_test_panel.csv (NEW) — pre-stacked panel
generated by extracting StackedDiD's internal stacked_data via the
fixed-seed Python fixture (n_units=50, n_periods=8, cohorts=[3,5,7],
seed=20260521; 325 rows after stacking).
- benchmarks/R/generate_stacked_did_golden.R (NEW) — loads CSV, fits
lm(weights=Q,...) with the same event-study design as Python, computes
CR1S (Stata-style) + CR2-BM SE + BM df_Satt at cluster=unit AND
cluster=unit_subexp via clubSandwich >= 0.7.0.
- benchmarks/data/stacked_did_golden.json (NEW) — committed goldens.
Tests:
- tests/test_stacked_did.py::TestStackedDiDVcovType (19 tests): default
+ bit-equality + reject paths + survey precedence + get_params/set_params
+ result-class field + clone idempotency + replicate-refit smoke +
reject-order regression + unit_subexp + survey symmetric.
- tests/test_methodology_stacked_did.py (4 tests, NEW): hc1 vs CR1S,
hc2_bm vs CR2 (unit), BM DOF vs coef_test()$df_Satt (via t-dist
inversion of CI half-width), hc2_bm vs CR2 (unit_subexp).
- Tolerance note: hc1 vs prior bake-Q-into-X is bit-equal up to ~2 ULPs
at SE scale (multiplication ordering in solve_ols(weights=) vs prior
user-side bake-w); pinned by test_hc1_se_bit_equal_to_pre_pr_baseline
at atol=1e-13.
Docs:
- REGISTRY.md StackedDiD section: new "Variance families" subsection
documenting hc1/hc2_bm routing + rejects, with **Note:** deferring
methodology framing to Phase 1a clubSandwich port (PR igerber#475) — no new
methodology synthesis introduced in this PR.
- CHANGELOG.md: Unreleased Added bullet at top.
- TODO.md: Phase 1b row updated to track remaining 6 estimators; new
StackedDiD conley follow-up row mirroring SA precedent.
234 tests pass across stacked + methodology_stacked_did +
methodology_sun_abraham + methodology_twfe + methodology_wls_cr2 +
estimators_vcov_type.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Local codex R3 caught a P1 on the same surface: when BM contrast DOF was unavailable (helper raised OR noise-floor NaN guard fired), StackedDiD either fell back to normal-theory inference (silent wrong CIs/p-values under the hc2_bm contract) or passed NaN df through safe_inference (which guards df<=0 but NOT NaN, producing finite t_stat with NaN p/CI — mixed inconsistent inference fields). Both fail-open patterns contradict the registry contract that hc2_bm aggregated inference uses BM DOF. Fix mirrors the LinearRegression.get_inference() pattern from PR igerber#475 R7 (linalg.py:3689-3706): on the hc2_bm path, when the per-contrast BM DOF is None or non-finite, emit ALL-NaN inference fields for that contrast rather than falling back to safe_inference(df=None or NaN). Effect and SE remain finite — only the inference fields downstream of the DOF are suppressed. Applied at both inference sites: - Event-study per-event-time inference (event_study_effects[h]): inline check before safe_inference call. - Overall ATT inference (overall_t/p/conf_int): same guard. New regression tests: - test_hc2_bm_nan_dof_fails_closed_with_all_nan_inference: monkeypatches `_compute_cr2_bm_contrast_dof` to return a NaN-only vector and asserts ALL event-study + overall t_stat/p_value/conf_int are NaN while effect and se remain finite. - test_hc2_bm_helper_raises_fails_closed_with_all_nan_inference: monkeypatches the helper to raise LinAlgError and asserts the fallback path NaN-closes inference (does NOT use normal-theory). 77 tests pass. Pattern is consistent with the analytical-surface fail-closed contract added in PR igerber#475 R7. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Release notes consolidate 8 PRs since 3.4.0 (2026-05-19): Public-surface variance lifts: - SpilloverDiD survey_design on HC1/CR1 via Binder TSL (Wave E.1, igerber#468) - SpilloverDiD vcov_type=conley + survey_design via stratified-Conley on PSU totals (Wave E.2, igerber#474) + lag_cutoff>0 follow-up (igerber#477) - SunAbraham vcov_type ∈ {classical, hc1, hc2, hc2_bm} (Phase 1b 1/8, igerber#472) - WLS-CR2 Bell-McCaffrey gates lifted via clubSandwich port (igerber#475) Methodology-review-tracker promotions (mostly docs/tests): - PreTrendsPower R pretrends parity goldens (PR-C, igerber#471) - HAD methodology-review-tracker promotion (igerber#473) - ContinuousDiD methodology-review-tracker promotion (igerber#476) All changes additive; bit-equal defaults preserved across the affected estimators. No new estimators (patch-level per semver convention). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
NotImplementedErrorgates in_validate_vcov_argsblockingvcov_type="hc2_bm" + weights(TODO.md rows 104-105, "Gates 4 and 5")_compute_cr2_bm/_compute_cr2_bm_contrast_dof/_compute_bm_dof_from_contrasts(W not √W in hat matrix, W² in bias-correction term, unweighted residuals in score, full H1/H2/H3 array Satterthwaite DOF)LinearRegression._bm_dofandLinearRegression.get_inference()benchmarks/data/clubsandwich_cr2_golden.jsonpin Python vs R parity at atol=1e-10 (vcov + non-noise-floor DOF + compound-contrast DOF); existing 6 unweighted scenarios unchangedweight_type ∈ {"aweight", "fweight"}+hc2_bm + weightsraisesNotImplementedError(port matchespweightonly); zero-total-weight clusters dropped with effective-cluster ≥ 2 guardMethodology references
clubSandwichv0.7.0 (Pustejovsky 2024) R source —R/CR-adjustments.R::CR2,R/clubSandwich.R::vcov_CR,R/coef_test.R::Satterthwaite_df,R/get_arrays.R::get_GH. Foundational papers: Bell & McCaffrey (2002), Pustejovsky & Tipton (2018) JBES, Imbens & Kolesar (2016) ReStat.feedback_wls_cr2_clubsandwich_parity), the textbook reading diverges from clubSandwich by 0.5-30% on weighted designs. clubSandwich uses W (not √W) in the hat matrix, W² in the bias term, and unweighted residuals in the score construction. Documented indocs/methodology/REGISTRY.mdPhase 1a section.UserWarningrather than ship BLAS-implementation-dependent values.Validation
tests/test_methodology_wls_cr2.py(19 tests: clubSandwich parity at atol=1e-10 across 6 weighted scenarios + compound-contrast DOF + unweighted regression safety + dual-path equivalence + weight-type rejection + zero-weight cluster rejection + LinearRegression_bm_dofthreading + LinearRegression NaN inference end-to-end + FE-dummy noise-floor guard)tests/test_linalg_hc2_bm.pyflipped two "gate raises NotImplementedError" tests to "gate lifted, produces finite vcov+DOF" smoke testsfeedback_r_source_smoke_test_before_implementing)linalg + methodology + estimators + TWFE + SA + estimators-vcov-typeSecurity / privacy
Test plan
pytest tests/test_methodology_wls_cr2.pypytest tests/test_linalg_hc2_bm.pypytest tests/test_estimators_vcov_type.py tests/test_methodology_twfe.py tests/test_methodology_sun_abraham.py tests/test_estimators.pyfeedback_local_codex_vs_ci_codex_divergence)🤖 Generated with Claude Code